Structural Features Extraction for Devnagari and Bangla Language Documents
نویسندگان
چکیده
منابع مشابه
Extraction of text-related features for condensing image documents
A system has been built that selects excerpts from a scanned document for presentation as a summary, without using character recognition. The method relies on the idea that the most significant sentences in a document contain words that are both specific to the document and have a relatively high frequency of occurrence within it. Accordingly, and entirely within the image domain, each page ima...
متن کاملDefinition Extraction using Linguistic and Structural Features
In this paper a combination of linguistic and structural information is used for the extraction of Dutch definitions. The corpus used is a collection of Dutch texts on computing and elearning containing 603 definitions. The extraction process consists of two steps. In the first step a parser using a grammar defined on the basis of the patterns observed in the definitions is applied on the compl...
متن کاملSpecification of UNL Deconverter for Bangla Language
At present the WWW represents a powerful tool for communication and information interchange. With simple mechanism, it is possible to access innumerable documents about a huge variety of topics, from any place around the world. However, despite the abundance of information, languages very often cause severe problems. When most of the web pages today are written in few most common languages like...
متن کاملStructural Features of Chinese Language
Chinese language is quite different from many western languages in various structural features. It is not alphabetic. Large number of Chinese characters are ideographic symbols. The monosyllabic structure, the open vocabulary nature, the flexible wording structure with tones, and the flexibilities in word ordering are good examples of the structural features of Chinese language. It is believed ...
متن کاملZone-based Keyword Spotting in Bangla and Devanagari Documents
In this paper we present a word spotting system in text lines for offline Indic scripts such as Bangla (Bengali) and Devanagari. Recently, it was shown that zone-wise recognition method improves the word recognition performance than conventional full word recognition system in Indic scripts [29]. Inspired with this idea we consider the zone segmentation approach and use middle zone information ...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
ژورنال
عنوان ژورنال: Indian Journal of Science and Technology
سال: 2015
ISSN: 0974-5645,0974-6846
DOI: 10.17485/ijst/2015/v8i13/56453